Deep learning for blind structured illumination microscopy

Blind-structured illumination microscopy (blind-SIM) enhances the optical resolution without the requirement of nonlinear effects or pre-defined illumination patterns. It is thus advantageous in experimental conditions where toxicity or biological fluctuations are an issue. In this work, we introduce a custom convolutional neural network architecture for blind-SIM: BS-CNN. We show that BS-CNN outperforms other blind-SIM deconvolution algorithms providing a resolution improvement of 2.17 together with a very high Fidelity (artifacts reduction). Furthermore, BS-CNN proves to be robust in cross-database variability: it is trained on synthetically augmented open-source data and evaluated on experiments. This approach paves the way to the employment of CNN-based deconvolution in all scenarios in which a statistical model for the illumination is available while the specific realizations are unknown or noisy.


Scientific Reports
| (2022) 12:8623 | https://doi.org/10.1038/s41598-022-12571-0 www.nature.com/scientificreports/ theory framework, in which a first encoding step generates a high-dimensional dense feature representation of the input, and a second decoding step upsamples the information in the desired representation 29 . In the encoding stage (see Fig.1 panel a), BS-CNN encodes the input in a dense feature representation by applying two-dimensional 3 × 3 convolutional layers of feature size 32, 64, 128 and 512 respectively, then followed by an element wise ReLU layer and a Max Pool layer. In the decoding stage, the 512 dense features are decoded to the original input size by three 3 × 3 two-dimensional deconvolutional layers of feature size 128, 64 and 32 respectively, followed by an element wise ReLU layer and an upsampling layer. The aim of this CNN is to reconstruct, from low-resolution fluorescence images, a high-resolution fluorophores-density spatial distribution ρ . To perform the training, we designed an original protocol to augment images from a publicly available open-source data-set 25 with synthetically generated illumination patterns. More precisely, we started from fluorescence microscopy images of biological targets, which we used as high resolution images target ground truth (GT). They represent the spatial distribution of the fluorophore molecules ρ . We pre-processed images from the repository by randomly extracting 10000 patches, each with a field of view 11 times larger than the Point Spread Function (PSF) of the optical system. We used 7500 of these patches for training, 2000 for validation, and 500 for testing. Then we "illuminated" the patches with speckle patterns as required for the standard blind-SIM approach. Speckle patterns 30 , which are light structures generated on a laser beam reflected by a rough surface, occur in the presence of disordered phase patterns. We generated numerically synthetic speckle patterns I(r) employing a model based on summing plane waves with random phases, random amplitudes, and a cuttoff spatial frequency that enables us to control the average speckle grain size of d sp (see Supplementary Materials). In a typical blind-SIM experiment, a few hundred of different illuminations replicas ( N = 600 ) are acquired to extract a high-resolution image. High-resolution fluorescent signal data are obtained simply by multiplying I · ρ . Note that, to ensure generality, each patch and illumination appears only once in the dataset. This syntacti-

Max Pool Feature
Upsampling 1d Conv. The BS-CNN architecture: for the encoder-decoder architecture we use two-dimensional convolutional layers of size 3 × 3 followed by an element-wise ReLU non-linearity. The feature number is first increased (encoder) by a factor of two from 32 to 512 while the image size is decreased by a 4 × 4 max pool layer. Then the opposite procedure is applied (decoder) up to the original image size. Finally, a one-dimensional convolutional layer is applied to produce a single-channel image. (b) Data handling: a ground truth ρ neuron image is selected from an open microscopy repository 25 ; the ground truth image is illuminated 600 times by different speckle realizations I, producing 600 high-frequency images; we convolve with an Airy disk Point spread function (PSF) of resolution η times bigger than the speckle correlation size and obtain 600 low-resolution frames corresponding to the same GT. By subtracting the low-resolution mean LR from each low-resolution frame LR and keeping only the high positive part we obtain the HP images. www.nature.com/scientificreports/ cally augmented data-set contains sufficient variability to incorporate the physical properties of the illumination patterns in the deconvolution process. The low-resolution images are then generated from the high-resolution ones applying a blurring convolutional kernel, with the size of the collection point spread function d PSF . This convolution operation mimics the information loss effect due to a collection objective with a limited numerical aperture (see Supplementary Materials). The ratio between d PSF and d sp is the parameter driving the effectiveness of the deconvolution. In a standard experiment, performed with an objective for both illuminated and collected light, focusing and collection PSF are identical (apart a small factor due to wavelength differences), that is d PSF = d sp . However, if two different optics are employed, d PSF and d sp could be different (see Supplementary Materials). In fact the Scattering Assisted Imaging technique 11 takes advantage of a reduced d sp to improve resolution. Thus we studied the general case in which d sp < d PSF that is η ≥ 1.
Before submitting data to the CNN for training the High Positive part of the Intensity of the signal HP is extracted from the low-resolution fluorescence signal where () + denotes that we keep only the matrix elements that are above zero and LR n is a set of N Low Resolution images with an average LR. This data handling sparsifies the low-resolution images so that the algorithm makes use of the sparse nature of high intensity speckles (see 11 and Supplementary Materials). The model is trained using the adam 31 algorithm, with exponential decay rate for the 1st moment β 1 = 0.9 , and exponential decay rate for the 2nd moment β 2 = 0.999 . We complete training after 120 epochs with a learning rate lr = 0.001 and we use as a loss the structural similarity index SSIM (see Supplementary Materials and 32 ). we stop the training when the validation loss is not improving. We found that the best result is obtained when we use a batchsize of 32.
Resolution measurement and fidelity. In Figs. 2 and 3 we show the performance of the BS-CNN algorithm averaged over 1000 illuminations and compare it with the SAI algorithm, a custom local search deconvolution algorithm that exploits the properties of the illumination statistics (see Supplementary Materials and 11 ) and and with the Lucy Richardson algorithm, fixing η = 2.5 .
In Fig.2a we show the Siemens Star object: a standard to extract image resolution. It has been degraded to obtain the Low-Resolution image (image on the right, LR) and then super resolved with (from left to right ) BS-CNN, SAI, and Lucy Richardson. The colored rings centered on the each star represent the radial value R where the Rayleigh criterion is satisfied. BS-CNN show the the best resolution. Resolution enhancement is indeed  figure (Fig. 3 panel b), we show two objects lying closer than the PSF resolution d PSF so that they are not resolved in the low-resolution image. BS-CNN finds accurately both the position of the maxima and their relative intensity, while SAI and Lucy Richardson misplace the left maximum and retrieve it with a lower intensity. Finally, we validate the BS-CNN approach on fully experimental data in Fig. 4, where speckles are generated employing a Digital Micromirror Device (DMD), with a random pattern controlling the illumination laser wavefront. Different illuminations are successively delivered to the sample by recasting the random pattern. To control the η parameter, the collection PSF (and in particular the d PSF parameter) has been tuned with an optical iris placed between the collection objective and the eyepiece 11 . In Fig. 4 we again compare BS-CNN, SAI and Lucy Richardson for η = 2.5 (maximum distance between distinguishable objects of 2.37 µ m) and with N = 600 illuminations. As we observe from the plot, BS-CNN can resolve objects up to a distance of 0.975 µ m which is 2.5 times smaller than the low-resolution object outperforming Lucy Richardson and SAI.

Discussion
In summary we trained and studied a CNN able to deconvolve blind-SIM image stacks in a wide range of η values thus capable to be effective in different experimental configurations. BS-CNN has been trained employing images of biological samples (cell images) taken from a repository and "illuminated" by synthetic optical www.nature.com/scientificreports/ patterns generated with a model for speckles, but also tested on real experimental data on biological samples. Even though the testing conditions are very different than those of the training, the BS-CNN confirms its effectiveness and robustness to data variability. We found that the BS-CNN outperforms other blind-SIM deconvolution algorithms and achieves an average resolution improvement with respect to the low-resolution images of 2.17, while it can reach up to a resolution improvement of 2.5 in specific spots. This improved performance is probably due to the neural networks generalization capability. The BS-CNN also produces images closer to the ground truth, as demonstrated quantitatively by the Fidelity parameter, indeed avoiding granularity or nonlinear effects. Moreover, we show an original approach for the CNN training, coupling a model of the illumination with experimental data. This strategy may be exported beyond microscopy, in all experimental configurations in which a modeling of the illumination is available, thus extending the data set available for training without degrading the final performance or the Fidelity.

Methods
Microscope details. A continuous-wave (CW) laser emits light at the wavelength = 638 nm . A first telescope enlarges the beam's size with two lenses respectively of focal lengths f 1 = 7 cm and f 2 = 20cm. Then the light is modulated by a Digital Micromirror Device (DMD) by a random binary mask to generate different speckle illuminations. This plane is imaged through a 4f lens system, made by a lens f = 20 cm , and an objective (numerical aperture NA = 0.75 ), to the sample. The objective collects the reflected light and the fluorescence that are then imaged by a third lens ( f = 20cm) to the camera (ORCA-Flash4.0 V3 Digital CMOS camera). A dichroic mirror and a long-pass spectral filter select the fluorescence signal. The camera's pixel size is 6.5 µ m, while the overall magnification is 40. The spatial resolution of the microscope can be degraded by means of an iris placed in front of the third lens. The sample is composed by retina neurons prepared with Alexa Fluor 533 fluorophores. The fluorescence measurements have been performed by illuminating the sample with 600 speckles illuminations and an exposure time of 0.2 s. In SAI each low resolution frame LR n (r) is approximated with a synthetic frame g n (r) composed by a sum of K Airy disks h (r) that have a size of d sp and are located in random positions r nk .
Scattering assisted imaging (SAI) deconvolution. The task of the deconvolution algorithm is to find the fluorophore density ρ(r) given the low resolution set LR n (r) . In SAI each LR n (r) is approximated with a synthetic frame g n (r) composed by a sum of K Airy disks h (r) that have a size of d sp and are located in random positions r nk , The distribution of P nk is a properly normalized multiplying factor respecting an exponential distribution. SAI finds the most probable configuration for the support r nk for each frame by a local search process that minimizes the mean absolute error between the HP n and the convoluted frame of g n (r)